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ABSTRACT 

The form of the primordial power spectrum has the potential to differentiate strongly between 
competing models of perturbation generation in the early universe and so is of considerable 
importance. The recent release of five years of WMAP observations have confirmed the gen- 
eral picture of the primordial power spectrum as deviating slightly from scale invariance with 
a spectral tilt parameter of n s ~ 0.96. Nonetheless, many attempts have been made to isolate 
further features such as breaks and cutoffs using a variety of methods, some employing more 
than ~ 10 varying parameters. In this paper we apply the robust technique of Bayesian model 
selection to reconstruct the optimal degree of structure in the spectrum. We model the spec- 
trum simply and generically as piecewise linear in In fc between 'nodes' in fc-space whose 
amplitudes are allowed to vary. The number of nodes and their fc-space positions are chosen 
by the Bayesian evidence so that we can identify both the complexity and location of any 
detected features. Our optimal reconstruction contains, perhaps, surprisingly few features, the 
data preferring just three nodes. This reconstruction allows for a degree of scale dependence 
of the tilt with the 'turn-over' scale occuring around fc ~ 0.016 Mpc -1 . More structure is pe- 
nalised by the evidence as over-fitting the data, so there is currently little point in attempting 
reconstructions that are more complex. 

Key words: methods: data analysis - methods: statistical - cosmology: - cosmic microwave 
background 



1 INTRODUCTION 

The recent release by the Wilkinson Microwave Anisotropy Probe 
(WMAP) of five years of observations have confirmed that the pri- 
mordial spectrum of density perturbations is consistent with being 
purely adiabatic and close to scale invariant, in perfect harmony 
with the simplest inflationary scenarios. This agreement appears 
remarkably robust when extended to independent datasets such as 
meas ures of the matter pow er spectrum from galaxy redshift sur- 
veys jTegmarket ai] r2006). Alternative models of the spectrum 
containing various features have been considered. These include 
an exponential large scale cutoff (Efstathiou 2003a) to explain the 
quadrupole power decrement, and theoretically motivated spec tra 
to model the inflationary potential jNicholson & C ontaldi 2008) or 
account for discont inuities from early universe phase transitions 
(Barriga et al. 2001). Reconstructions of the spectrum, limiting a 
priori assumptions about its structure, have t ypically involved fit- 
ting s ome basis functions, such as wavelets (Mukheriee & Wang 
2003), some deconvolution me thod Jshafieloo & Souradeed l2004; 
Tocchini-Valentini et al. I l200l) or directl y 'binning' the sp ectrum 
into an arbitrary number of band powers dBridle et alj2 003). How- 
ever, most previous methods fail to account for Occam's razor since 
they assume that more complexity, and typically more 'detected' 
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features, are necessari ly important in explaining the data. Recently 
IVerde & Peirisl ([2008) reconstructed the spectrum, while minimis- 
ing the level of complexity needed via a cross-validation with a 
'hold-out' portion of the data. This approach is a timely progres- 
sion, but in this paper we attempt a more statistically robust pro- 
cedure with an optimal reconstruction using the Bayesian evidence 
to decide how much detail one should fit and where it is located in 
fc-space, based solely on the data. 



2 PARAMETERISATION OF THE PRIMORDIAL 
SPECTRUM 

Inflationary models generically predict the initial spectrum of 
scalar density perturbations to be close to scale invariant with just 
slight scale dependence, commonly called tilt, a red (blue) tilt for 
decreasing (increasing) amplitude at smaller scales. Theoretical 
motivation for this form is fo und in the slow-rol l f ormulation of in- 
flatio n. Previous studies (e.g. lLeach et al. 2002 & lPeiris &~E asther 
2006) have used spectral models defined explicitly by the physical 
slow-roll parameters but here we define the spectrum essentially 
empirically using a spectral amplitude, A s , a spectral index or tilt 
parameter n s and a running parameter n run = denoting any 
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Table 1. Priors of the base cosmological parameters. 



0.018 fl b h 2 ^ 0.032 
0.04 <C Q dm h 2 < 0.16 
0.98 ^ 9 < 1.1 
0.01 < r <C 0.5 

-o.i < n k < o.i 



tilt scale dependence: 



V(k) 



As T- 



-1+4 1 



11 ( k Q / 



(1) 



where fcrj denotes the scale about which the tilted spectrum pivots 
which throughout w e set at 0.05 Mpc -1 . It has been shown previ- 
ously jTrottj2007t) that this parameterisation, although not physi- 
cal in itself, does within suitable prior ranges adequately model the 
inflationary primordial spectrum. 

The parameterisation described by Eqn. [T] encompasses the 
most commonly tested power spectra, namely: the scale invariant 
or Harrison-Zel'dovich spectrum (in which 1 — n s = n Iun — 0), 
the tilted spectrum (n run = 0) and a running spectrum in which the 
tilt becomes a function of scale (n rU n 7^ 0). To these we can add a 
'cutoff spectrum which allows V(k) to drop to zero below some 
variable cutoff scale and above which behaves like a tilted spec- 
trum. We shall use this as a simple test as to whether the addition 
of some cutoff feature is actually required by the data. 

In this paper, however, we are primarily interested in de- 
termining structure in the primordial spectrum using an optimal 
model-free reconstruction. We use the Bayesian evidence as dis- 
criminator in fitting a simple spectrum based on linear interpolation 
between a set of amplitude- varying nodes in fc-space. This is essen- 
tially the same binning format as that used previously by a num- 
ber of authors iBridle et al.l2003llBridges et al.l2006l.lBridges et all 
l200l l Sperg el & et al.l2 007) however here we aim to allow the data 
to decide upon the location and number of nodes via the evidence. 

In the background cosmology we allow the possibility of a 
non-flat ACDM cosmology specified by the following five param- 
eters: the physical baryonic matter density Q,i,h 2 , the physical dark 
matter density Qdmh 2 , the ratio of the sound horizon to angular di- 
ameter distance 0, the optical depth to reionisation r and the curva- 
ture density fl^, where the corresponding priors are listed in Table 
[T] Additionally we allow a contribution to the small-scale power 
in the CMB spectrum from Suny aev-Zeldovich fluctuations as per- 
forme d in the WMAP analysis iDunkl ev et alj200^ . lKomatsu et al.l 
l2O08h . 

The structure of the paper is as follows: in section [3] we de- 
scribe basic model selection and our algorithm, in section [4] we 
list the individual datasets and discuss the combinations used, in 
section [5] we will review the current status of the standard, scale- 
invariant, tilted and running parameterisations of the power spec- 
trum in light of the WMAP5 data and test the possibility of a large- 
scale cutoff. We then briefly discuss the consistency of the datasets 
using a quantifiable Bayesian measure in section [6] The remain- 
der of the paper is then devoted to our optimal reconstruction (sec- 
tion[7} and our conclusions (section[8]l. 



3 BAYESIAN INFERENCE 

The Bayesian methodology provides a logical and consistent ap- 
proach to extracting inferences from a set of data. Given a model, 



or hypothesis H defined by a set of parameters 0, Bayes' theo- 
rem tell us how to determine the probability distribution of those 
parameters given the data D: 



Pr(0|D,H) = 



Pr(D|0,ff)Pr(0|#) 
Pr(D|ff) ' 



(2) 



where for future simplicity we define Pr(0|D,//) = P(&) 
as the posterior probability distribution of the parameters, 
Pr(D|0,#) = £(0) as the data likelihood, and Pr(0|-ff) = 
7r(0) as the prior. Of particular importance here is the Bayesian 
evidence term Pr(D|ff) = Z. 

To obtain parameter constraints given a model the evidence is 
often ignored since it is independent of the parameters 0. The pos- 
terior distribution is simply constructed by Monte Carlo sampling 
from the combined distribution P(&) oc £(©)7r(0). Typically 
most of the posterior weight lies in a relatively small range of and 
so using some importance sampling procedure, like Metropolis- 
Hastings, one quickly generates estimates of the best-fitting param- 
eter values and their variances. 

Bayesian model selection also relies on the posterior distribu- 
tion and is based on its normalisation over the parameter space 0. 
This term is in fact given by the evidence Z and can be computed 
by performing the integral: 



Z 



£(0)7r(0)(T0, 



(3) 



where N is the dimensionality of the parameter space. Thus Z can 
be defined as the average of the likelihood over the prior. The evi- 
dence naturally incorporates Occam's razor: a simpler theory with 
a more compact parameter space will have a larger evidence than 
a more complicated one, unless the latter is significantly better at 
explaining the data. 

The question of which model best describes the data can then 
be addressed by comparing the properly normalised posterior prob- 
ability distributions calculated for two hypotheses Ho and Hi. 



Pr(Hi|D) Pr(D|i?i)Pr(i?i) Z 1 Pr(H 1 ) 



Pr(ff |D) Pr(D|# )Pr(#o) Z Pr(H ) 



(4) 



where Pr{H\)/ Pr(Ho) is the a priori probability ratio for the two 
models, which can be set to unity if we have no reason to pre- 
fer hypothesis Ho over Hi initially. For convenience the ratio of 
evidences Z\ / Zo (or equivalently the difference in log evidences 
In Z\ — In Zo ) is often termed the Bayes ' factor Bo\ ■ Interpreting 
the level of significance one should ascribe to a given B value is 
often a matter of experienced j udgement, how ever a suitable guide- 
line scale has been laid out bv lJeffrevsl dl96lh . If B < 1 Hi should 
not be favoured over Ho, 1 < B < 2.5 is significant, 2.5 < B < 5 
is strong evidence while B > 5 would be considered decisive. 

Computation of the multidimensional integral Eqn. [3] is not 
a trivial task and approaches such as thermodynamic integration 
have previously been shown to be both slow and in accurate. In this 
analysis we apply the method of nested sampling (Skillin3 l2004h 
which transforms the A^-dimensional integral in Eqn. [3] to one di- 
mension and computes it by drawing uniform samples from ever 
decreasing nested shells in the prior parameter space. We apply an 
algorithm based on this procedure called MultiNest which con- 
strains t he nested shells in the prior space with N -dimensional el- 
lipsoids dFeroz & Hobsonll2008l ; iFeroz et al.ll2008l) . This approach 
results in an order of magnitude improvement in efficiency and ac- 
curacy over previous methods. 
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4 DATASETS CONSIDERED 

In this analysis we have divided the data into two categories: CMB 
only and CMB plus observations of the matter power spectra from 
Large Scale Structure (LSS) surveys. This is primarily designed so 
that we can test consistency across the datasets in an initial analy- 
sis before carrying over a final set of data to our power spectrum 
reconstruction. We consider a number of CM B experiments includ- 
ing the latest five year release from WMAP dHinshaw et alj|2008h 
plus recent res ults from the Arcminu te Cosmology Bolometer Ar- 
ray [ACBAR; iReichardt et al.l l2008ll which should be uniquely 
useful here due to their tight constraints out to small angular 
scales. In additi on we include Cosmic Background Imager ob- 
serva tions [CBI; CBI Supplementarv Data 2004; Readhea d et al.1 
I2004I1 and Balloon Observations of Millime tric Extra-galactic Ra- 
diation and Geophysics [BOOMERA NG; IPiacentini et al.l 120061 ; 
I Jones et alj|2006l ; iMontrov et alj|2003l . LSS data includes the lu- 
minous red galaxy (LRG) subset D4 of the Sloan Digital Sky Sur- 
vey [SDSS; Tegmark eT5]|2004ll and the two degree field survey 
r2dF; ICole et alj|2005tl . We allow for modelling of non-linearities 
and galaxy biasing of the matter power sp ectrum in the LR G sam- 
ple using the transfer function defined bv lCole et alj (2005)0 We 
analytically marginalise over the paramete r combination Qb 2 and 
set A = 1.4, as shown bv lCole et al.l J20051) to be adequate. 



5 SIMPLE POWER SPECTRUM MODELS 

Many previous analyses have considered the four most basic pa- 
rameterisations described in Section 2 in light of WMAP obser- 
vations plus a plethora of higher resolution CMB and Large Scale 
Structure (LSS) data. Here we will briefly summarise the current 
status of these models. The first year WMAP [WMAP1] data on 
its own had no preference for a tilt (n s = 0.99 ± 0.04) but the 
inclusion of highe r resolution CMB data and LSS data induced 
a marked red-tilt dSpergel et all [2003). By year three of WMAP 
[WMAP3], with tighter constraints on the second and third acoustic 
peaks, a red tilt became discernible even without additional datasets 
dSpergel & et~ai1l2007r) n s = 0.958 ± 0.016. The recent WMAP 
five year release confirm s the value at ~ 0.9 6 with a mean es- 
timate of 0.963 ± 0.015 dKomatsu et alJ.l2008l) . The position of a 
running spectral index has been more controversial: WMAP1 alone 
preferred a large mean value of n run though with little statisti- 
cal significance, with WMAP3 alone however a value of n run was 
found, that within la limits was deviant from zero. A number of au- 
thors dViel et al.ll20"0o1 ; ISeliak etatood : [Bridges et alj|2007l) have 
subsequently found that in the case of WMAP3, running was al- 
most completely remov ed on addition of the SDSS Ly-o forest data 
dMcDonald et al.l2006l) . Ly-a data probes scales (~ Mpc), small in 
comparison to other datasets used, and so provides a long 'lever 
arm' for primordial spectrum analyses. However further discrepan- 
cies in other cosmological parameters, at the level of almost 2a, 
has cast some doubt on the conclusions made when using this data 
so we do not include it here. 

Theoretically motivated priors on the tilt are easily extracted 
from the slow-roll inflationary framework as n s = 1 — 6e + 1r\, 
where e and r\ are the slow-roll parameters. For the slow-roll con- 
ditions to be met we then require that e ~ and that r\ < C 1. If 
we assume that r) must be ^ 0.1 we get n B = 1 i 0.2 dTrottal 




Figure 1. Marginalised posterior probability of the spectral tilt n s using 
CMB plus LSS data (solid) and CMB data alone (dotted). Note: in this and 
all subsequent figures each posterior is normalised independently. 



Non — linear 
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120071) . Spectral running is expected to be small, in fact n Iun even 
at the level of 0.05 wou ld rule out all simple inflationary scenarios 
dEasther&Peirisll2006l) . Thus if assuming slow-roll inflation we 
are free to set quite a tight prior —0.2 ^ n rvm ^ 0.2. Uniform 
prior distributions over these ranges were adopted throughout. 

Using MultiNest as described in section[3]a set of posterior 
samples and model evidences were computed using the two basic 
datasets described in section [4] for the basic suite of models: H- 
Z spectrum, a tilted spectrum, a tilted spectrum with running and a 
tilted spectrum with a large scale cutoff. For now this simply serves 
as a useful sanity check for consistency between datasets, but later, 
in section [6] the appropriate Bayesian consistency measure will be 
applied to quantify any discrepancy. 

We will now discuss the most common set of parameters that 
are typcially used to describe the primordial spectrum: n s from the 
tilted spectrum and n run from the tilted spectrum with running. Fig- 
ureQ]shows the marginalised posterior distribution on n s from the 
tilted power spectrum using CMB data alone and in a joint analysis 
with LSS data. We find a mean value of n s = 0.962 ± 0.018, this 
value shifting upwards only marginally when includin g LSS (n a = 
0.967) . These results are in good agreement with iKomatsu et all 
d2008l) despite our relaxation of the requirement for universal flat- 
ness. Deviations from n s « 1 such as these, at ~ 2a are now seen 
as persuasive evidence for a red-tilt. The Bayesian evidence how- 
ever would need a significantly larger deviation (in fact closer to the 
level of 5<r!) to conclude decisively that tilt was present. At present 
these results produce a Bayes' factor of Bn-z,n a ~ 1.1 — 1.6 
(see Table [2}, that is significant but certainly not strong evidence 
in favour of a tilt. Running in the spectrum remains ambiguous 
with CMB data alone (roughly a la deviation from n run = 0), 
but the addition of LSS data shifts the mean value to within ±0.02 
of zero (Fig. 2). This effect has been observed on a nu mber of oc- 
casions (e.g. iTegmark et alj|2003l iBridges et al.l r2007) and is due 
mainly to the excellent high-fc constraints coming from the LRG 
data. The evidence does not favour running in either dataset, with 
\Bii-z,n tun ~ 0.4| , just outside our estimated margin of error. 



Figure[3] shows the measured Ce values at low-^ for WMAP1, 
3 and 5 with the best-fit theoretical mod el (and correspondin g 
cosmic variance limits) as determined by Dunl devet alT [2008). 
The mean Ci estimators at both the quadrupole and octopole in 
WMAP1 are seen to be deviant from the fiducial model by close 
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Figure 2. Marginalised posterior probability of spectral running n run using 
CMB plus LSS data (solid) and CMB data alone (dotted). 



Table 2. Bayes' factors comparing a scale invariant (H-Z) spectrum with 
models containing tilt, running and a large scale cutoff using both CMB 
alone and CMB + LSS data. 



Model 


CMB 


CMB + LSS 


H-Z 


0.0 ±0.3 


0.0 + 0.3 


n a 


+1.6 ±0.3 


+1.1 ± 0.3 




+0.4 ±0.3 


-0.4 + 0.3 


kc 


+1.5 + 0.3 


+1.3 ± 0.3 



to the cosmic variance limit. The situation changed somewhat in 
the three-year (and subsequently five-year) release so that now the 
octopole has shifted upwards to lie comfortably close to its ex- 
pected value, but the quadrupole remains anomalously low. The 
statistical signific ance has been questioned by many authors (e.g. 
lEfstathioul koCBb) and spurious alignments between the affected 
multipoles have been sugg ested as evidence of some large scale 
foreground contamination dde Oliveira-Costa et alj2004l) . However 
here we shall assume that the effect is a real one and attempt to ex- 
plain the large-scale CMB decrement with a feature in the primor- 
dial spectrum. 

Naturally, at present the data will prefer a model that includes 
a large scale cutoff, but does the data find one necessary? We can 
test this with a simple 'cartoon' model by abruptly curtailing a tilted 
spectrum below some variable scale k c so that its form is given by: 

f 0, k < k c 

The marginalised posterior distributions for k c in Fig.|4]show a pre- 
ferred scale around 2.7 x 10~ 4 Mpc -1 , consistent with an angular 
scale around £ — 2 — 4 as expected. Interestingly although blind 
to scales around the cutoff, a joint analyses with LSS data shows a 
pronounced peak at k c ~ suggesting that the constraining power 
of, particularly LRG data, now matches current CMB data. In other 
words, now that constraints at smaller scales are becoming tighter, 
anomalies such as the cutoff are becoming less important. The evi- 
dence confirms this (see Table[2j showing that the extra parameter 
is superfluous. 

The current position of these standard parameterisations then 
appears straightforward, with CMB data alone and in joint analy- 
sis with LSS, a purely scale-invariant spectrum is significantly dis- 
favoured by the data. However the addition of a running parameter 
remains of dubious necessity with CMB data alone and is actually 




e 



Figure 3. Low-/? multipoles and Icr error bars from three releases of WMAP 
data the best-fit fiducial power spectrum based on WMAP5 inferences is 
also plotted and shows the associated cosmic variance limits. [Note t values 
are slightly offset for clarity.] 




0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 

Figure 4. Marginalised posterior probability of the large scale spectral cut- 
off k c using CMB plus LSS data (solid) and CMB data alone (dotted). 

disfavoured when LSS constraints are included. A large scale cut- 
off in the primordial spectrum remains a suitable explanation of the 
WMAP quadrupole decrement but according to the evidence there 
is currently no need to include it in the model. 



6 DATASET CONSISTENCY 

Combining multiple datasets in joint analyses, in particular the 
recent inclusion of observations of the baryonic acoustic oscilla- 
tions in LSS surveys with CMB observations, have led to tight 

i : it — ~i 

constraints on the cosmological parameters (Tegmark et al. 2006). 
Authors regularly comment on the relative consistency between 
datasets by comparing the parameter constraints made with each 
set individually and when combined, h owever little effort is nor- 
mally made to quantify this consistency. iMarshall e t al. ( 2006) cs- 
tablished just such a method using the Bayesian evidence (see also 
iHobson et alj2002h . This is important for our reconstruction as ex- 
perimental features, such as discontinuities on scales where obser- 
vations meet may result in false detections of spectral structure. 
The two datasets chosen, CMB and LSS, now overlap consider- 
ably on scales starting around k ~ 0.02 Mpc - . If a data incon- 
sistency were to exist it would likely appear as a feat ure close to 
this sc ale. Curiously such a feature has been identified, Ver de et alj 
d2003h detected a deviation from a simple tilt around k ~ 0.01 
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Table 3. Bayes' factors comparing the assumption of dataset consistency 
(Ho = consistent, H\ = inconsistent) using CMB + LSS datasets for each 
of the models considered above. 



Model 


Boi 


H-Z 


+2.6 ±0.3 




+1.9 ±0.3 




+1.1 + 0.3 




+1.5 + 0.3 



Mpc -1 . This effect was strongest when using WMAP data alone, 
appearing considerably reduced in joint analyses with other higher 
resolution CMB and LSS data. Here we will apply a Bayesian con- 
sistency check to assess whether we can be justified in combining 
these datasets in our analyses. 

Consider the null hypothesis Ho that given two independent 
sets of data there is one model and one set of parameters to explain 
them. In this case we would say that the datasets are 'consistent'. 
However we would really like a quantitative measure by which to 
assess this consistency. If we consider the alternative, Hi, that each 
dataset separately prefers a different set of parameters, we can then 
construct the Bayes' factor between the two hypotheses as: 



Pr(D|#o) 
Pr(D|#i) 

-26(D) 

n^ipi) 



(6) 
(7) 



where we have written Pr(D|_ffi) as the product of evidences from 
each individual (independent) dataset Di. In this form consistency 
can easily be checked by computing the joint evidence and the evi- 
dence due to each dataset separately. As with any other hypothesis 
test we can assess the appropriate model with the aid of the Jef- 
freys' scale based on the final Bayes' factor. 

Table [3] lists the appropriate Bayes' factors for each model 
based on our two datsets: CMB and CMB+LSS. Firstly, all fac- 
tors are positive and greater than unity, confirming that these sets 
of data are indeed all essentially free from discrepancies. On the 
Jeffreys' scale, hypothesis Ho that the datasets are consistent, is 
favoured significantly. The highest degree of consistency occurs 
for the H-Z model, this is not surprising as both datasets provide 
equivalent constraints on the amplitude of fluctuations. Where we 
did observe differences in parameter constraints with the running 
and cutoff models, we can see how this measure has quantified the 
discrepancy. For instance, the addition of LSS data, led to slightly 
tighter constraints on the parameter n Iun (as well as being pulled 
closer to zero) (Fig. [5} and this difference has lowered the evidence 
in favour of consistency from nearly two log units to ~ 1. A similar 
but less pronounced effect is observed with the cutoff model. 

The deviations seen are minor. The worst discrepancy found, 
using the running model, was still consistent with CMB data, with 
odds of around 3:1 in favour (i.e. e AlnZ = e Bal ) while under the 
assumption of scale invariance the datasets are consistent at around 
14: 1 in favour. These differences are best explained by the superior 
small scale constraints that are possible when using LSS data rather 
than a genuine inconsistency, and we feel it is justified to perform 
our reconstruction using the joint set of data given the increased 
constraining power possible. 



7 OPTIMAL POWER SPECTRUM RECONSTRUCTION 

The degree of structure that can or should be usefully constrained 
in the primordial spectru m has been a source o f increasing debate 
in the literature. Rece ntlylVerde & Peiris J2008I) applied a smooth- 
ing spline technique ( Sealf oiTet alj 2005 ) that attempts, via cross- 
validation with part of the data, to minimise the complexity of the 
parameterisation. This approach selects an initial set of 'knots' that 
are fixed in k space but whose amplitudes may vary, and through 
which various splines are fitted, thus constructing the primordial 
spectrum. This approach will preferentially identify smooth struc- 
tures rather than sharp breaks, and while it is true that most devia- 
tions from scale invariance given the slow-roll assumption will be 
smooth, we do not believe the data is currently accurate enough for 
this to be the limiting factor for an analysis. We have thus attempted 
to use the simplest reconstruction possible, while still maintain- 
ing continuity, by linearly interpolating between a set of nodes, at 
which we allow the amplitude to vary. Our reconstructions gain 
complexity by the addition of new nodes and on estimating the ev- 
idence for each reconstruction one can decide exactly the level of 
parameterisation deemed necessary by the data. 

We start with one node, see Fig. 5 (a), so our base model is 
equivalent to the scale-invariant H-Z spectrum. The next model, (b) 
allows for two, sufficiently separated, independently varying nodes, 
thus emulating a tilted spectrum. We then add a third node (c), 
spaced logarithmically midway between two existing nodes. This 
process continues, at each stage the additional node being added 
between the existing ones, so that at the fourth stage there are two 
possibilities, (d) and (e). At the fifth stage there are three possibili- 
ties, at the sixth, four and so on. One can see that by such a process, 
using the evidence as the model discriminator at each stage, not 
only are the number of parameters constrained but also the location 
of features in fc-space, so that we can faithfully reconstruct both the 
degree and position of any spectral structure. It should also be clear 
that if we branch at one reconstruction by accepting a new node 
at some position (say the lower k node in (d) rather than (e)), we 
still retain the option of splitting the unaccepted region later (i.e. in 
(h)). Thus we fully explore the options in feature space and should 
hierarchicly detect as much structure as the data will allow. 

The only assumptions required are the positions of the two 
extremal nodes, fc m i n and fc max . These bounds were chosen to lie at 
sufficiently large (fc max = 2.7 Mpc -1 ) and small (fc m i n = 0.0001 
Mpc -1 ) scales so as safely to encompass all current observational 
probes and crucially, when more than 2 nodes are used, to allow 
the spectrum to tend naturally to zero power, particularly on small 
scales. A conservative amplitude prior of 0-55 xlO -10 was used 
throughout on all nodes. 



7.1 Model Comparison I: the Bayesian evidence 

The marginalised 1-dimensional posterior distributions for the am- 
plitude at each node and for each reconstruction are shown in Fig. 6 
[Fig. 5 illustrates the corresponding form of the reconstructed spec- 
tra from the mean posterior estimates (with lo error bars on the 
amplitudes)]. Comparing figure (b) in both Figs. 5 and 6 we see 
that there is only an upper bound on the amplitude at fc max , with no 
lower bound. This is a consequence of our choice of a large fe max , 
well above any current experimental constraint and simply allows 
the power to gradually fall to zero. The difference in evidence is 
minimal between the base and two node model with B12 = 0.66 
being too small, on the Jeffreys' scale to draw any decisive con- 
clusions, though within the error the evidence marginally prefers 
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(a) 1: Bu = 0.00 ± 0.30 




(b) 2: 021 = +0.66 ± 0.30 




(c) 3 : B 3 i = +1.08 ±0.30 




(d) 4i : B 4r l = -0.34 ± 0.30 (e) 4 n : B 4 „i = -1.41 ± 0.30 



(f) 5i : Bsil = -0.51 ± 0.30 (g) 5 n : B 5lI i = -2.41 ± 0.30 (h) 5 ra : B Sra l = -2.05 ± 0.30 



I . 



(i) 6i : B 6l i = -0.21 ± 0.30 (j) 6 n : B 6lI i = -0.40 ± 0.30 (k) 6 ra : B 6lII i = -2.10 ± 0.30 (1) 6iv : B 6lv i = -1.97 ± 0.30 



Figure 5. Linear interpolated reconstructions of the primordial spectrum with associated Bayes' factors with respect model 1. The amplitude was allowed to 
vary at each of the nodes (shown with black circles). Mean amplitude values and lcr limits are shown, taken from the posteriors illustrated in Fig. 6. 
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(a) 1: Bn = 0.00 ± 0.30 




(b) 2: S 2 l = +0.66 ± 0.30 




(c) 3 : B31 = +1.08 ±0.30 




-P{k) X 10 -10 5 ° ° Z ° V{k) X 10 10 

(d) 4i : B 4l i = -0.34 ± 0.30 (e) 4 n : B 4 „i = -1.41 ± 0.30 




■P(fe) X 10 -10 " ° '° Z ° X 10° -10 " ° Z ° V(k) X 10°~ 10 5 ° ° '° " -P(fc) X 10° -10 

(i) 6i : S 6l i = -0.21 ± 0.30 (j) 6 n : B 6lr i = -0.40 ± 0.30 (k) 6 m : B 6in i = -2.10 ± 0.30 (1) 6i V : B 6iv i = -1.97 ± 0.30 



Figure 6. Marginalised 1 -dimensional posterior distributions of the amplitude at each fc-space node used in each reconstruction. 
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model 2. The third model adds a node at k ~ 0.0166 Mpc -1 emu- 
lating a degree of spectral running by allowing a slight variation in 
the interpolated slopes between the three nodes. Though no mean- 
ingful constraint is possible at the upper k scale, this model is pre- 
ferred over model 2 with B23 ~ 0.4 and significantly over the base 
model by B02 ~ 1.1 units. The fourth stage reconstruction requires 
us to test two combinations of node positions, the first, 4i, splits the 
lowest k bin at k ~ 0.00129 Mpc -1 while the second, 4n divides 
the upper k bin. 63^ and Ba4 u both significantly disfavour the ad- 
dition of a fourth node. This result points to some deviation from 
scale invariance at around the position k ~ 0.01, the rough lo- 
cation of the additional node in model 3. Further parameterisation 
both above (4n) and below (4i) this s cale is disfavou r ed, le nding 
credence to the general conclusions of IVerde & Peirij J2008t) who 
found a similar 'turn-over' scale. According to the evidence the op- 
timal reconstruction contains, perhaps surprisingly only three pa- 
rameters. 

It is interesting to note that the parameterisation in 4i is sig- 
nificantly preferred over 4n, i.e. an additional node seems to be 
preferred on large scales over small. Although technically redun- 
dant we can continue to a fifth and sixth stage to see if this ef- 
fect continues. Assuming then that the fourth stage evidence has 
now indicated a preference for large scale (small k) structure over 
small we continue by sub-dividing the largest k bin of 4i again at 
k ~ 0.00036, which we denote as 5t. The two other possible split- 
tings being 5n at k ~ 0.00462 and 5m at k ~ 0.21. To within 
estimated error B4 I s I ~ and again both 5n and 5m are signifi- 
cantly disfavoured. This result is repeated at the sixth stage. 

So, curiously, although the evidence peaks at model 3 there is 
a substantial preference in all subsequent reconstructions for addi- 
tional amplitude nodes to be placed at large scales (i.e. models 4i, 
5i and 61). Furthermore the evidence is observed to plateau in value 
with £?4j5j and B5 I e I being roughly zero. The first result could sug- 
gest that although the data cannot yet cope with the extra complex- 
ity, large scale structure is useful in a model. However when com- 
bined with the second result this points to the additional parame- 
ters not over complicating the model but instead being ignored and 
left unconstrained by the data. The evidence is quite deliberately 
adept at ignoring such extra complexity; the extra undetermined 
parameter direction simply does not affect the average posterior 
over the prior. This effect is demonstrated here by comparing Fig- 
ures 5 (d) and (f) where the act of placing an additional node at 
~ 0.00129 Mpc~ removes all constraint on the amplitude at node 
fc m in and thus de-facto removes a parameter from the analysis. To 
account correctly for this effect, the analyst requires a further level 
of model discrimination, that can interpret quantitatively the con- 
straining power of a given model and data combination. For this we 
must fully define what we are penalising in extra model complexity, 
and for this we turn to the Bayesian complexity. 

7.2 Model Comparison II: the Bayesian complexity 

The advantage of Bayesian model selection is that it penalises 
model parameters that cannot be justified by the data. How- 
ever the number of free parameters is only the most naive mea- 
sure of the complexity of a model. A more thorough compari- 
son can be gleaned from what is termed the Bayesian or effec- 
tive complexity of a mo del. This definition was first given by 
Spiegelhalt er et al.l d2002h and w as subsequently introduced into 
cosmology bv lKunz et alj ((2006). The starting point is a quantifi- 
able definition of how a set of data can improve the prior knowledge 
of a model. In other words a measure of the relative difference 



between the posterior and prior distributions, sometimes termed 
the information gain. The Kullback-Leibler (KL) divergence Dkl 
measures just this, via the relative entropy between two probability 
distributions, P and n: 

g KL (f» = J P(Q|D)ln P |®Q° } de. (8) 

From this definition the Bayesian complexity can then be defined 
as the difference in Dkl between some real experiment and the 
ideal situation where the information gain is maximised. To see 
how this works, let us take the ideal example of a uniform prior 
distribution n and an excellent set of data D such that on comple- 
tion of a Bayesian analysis the prior distribution collapses into a 
S— function posterior distribution about some parameter vector 0'. 
This we take as our ideal scenario in which the divergence between 
posterior and prior is maximised and is given approximately by 
D' Kh = lnP(©')/7r(0').Ina realistic experiment of course the 
posterior P(&) will resemble some (approximately) multidimen- 
sional Gaussian distribution with some mean parameter vector 
and an associated variance so that the divergence would be given 
simply by Eqn. [8] The Bayesian complexity Cb can thus be de- 
fined as the difference between the ideal point estimate D' K l and 
the actual divergence: 

C B = -2 (Dkl(P, tt) - SkI) . (9) 

This leaves us free to choose an appropriate point estimate that 
maximises information gain -which for most well constrained cos- 
mological problems can be taken to be the mean of the full poste- 
rior distribution. Using Eqn.[8]and Bayes' theorem one can rewrite 
Eqn. [9] as: 

C B = -2 J P(0|D)ln£(0) + 21n£(0)d0. (10) 

By defining an effective \ 2 through £(©) oc e~ x / 2 , such that all 
constant factors within the likelihoods drop out, we can define the 
Bayesian complexity as: 

c B = r I (0)-x 2 (0), (ID 

where the first term denotes the mean \ 2 across a set of posterior 
samples while the second term is the \ 2 at the mean parameter 
values. 

Based on this definition the Bayesian complexity succinctly 
compares the constraining power of the data with the predictivity 
of the model. Thus a model with highly restrictive priors, and un- 
constrained posteriors will have a low Bayesian complexity, as the 
predictiveness of the model was already very high initially. Con- 
versely, wide priors with highly constrained posteriors will result 
in a high complexity (which can tend to a maximal value equal to 
the actual number of model parameters, Co) as the data constrained 
the model substantially over the uninformative priors. 

It should be emphasised that estimates of the Bayesian com- 
plexity cannot be used in isolation for model selection, blindly 
choosing the model with the smallest complexity would simply 
under-fit the data. Instead it provides a useful discriminator in cases 
where the evidence difference between models is so small (say < 
1 log unit on the Jeffreys' scale) that little inference can be drawn 
with the evidence alone. Besides the most obvious scenario, where 
both models are essentially equally informative, the case, as we had 
in the last section can be envisaged where additional parameters are 
simply left unconstrained by the data, such that in the evidence in- 
tegral this direction is simply averaged over. Here the complexity 
can quantify whether or not the additional parameters have actually 
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Table 4. The reconstruction Bayesian complexity Cg and actual number of 
model parameters Co- 



Model 


Co 


C B 


1 


7 


5.35 ± 0.10 


2 


8 


6.35 ± 0.10 


3 


9 


7.03 ± 0.10 


'h 


10 


7.82 ± 0.10 


In 


10 


7.18 ± 0.10 


5i 


11 


8.04 ±0.10 


5n 


11 


8.60 ± 0.10 


5iii 


11 


8.37 ±0.10 


6i 


12 


8.04 ±0.10 


6n 


12 


8.60 ±0.10 


6iii 


12 


8.37 ±0.10 


6iv 


12 


8.37 ±0.10 



8.5 




5 I 1 1 1 1 1 

7 8 9 10 11 12 

Co 

Figure 7. Bayesian complexity Cb versus actual number of model param- 
eters Co for models: 1, 2, 3, 4t, 5i and 6t. Note how Cb increases almost 
linearly with Co until model 4j (Co = 10) when Cb begins to plateau in 
value as successively less well constrained parameters are added. 



been constrained and thus extracted any further information from 
the data. 

Table |4] lists the recovered complexity for each of our recon- 
structions tested. It should be noted that we have chosen quite a 
generic background cosmology accounting for both the possibil- 
ity of spatial curvature, via the Qk parameter and the marginali- 
sation over a possibl e SZ contribution at high £ as was done in 
iKomatsu et al.l J2008). Inclusion of recent LRG data with their as- 
sociated tight constraints on Qk will minimise any effect on Cb, 
however Asz remains essentially unconstrained by current data. 
Thus it is not surprising to see our base, scale invariant model 1 
having an effective complexity significantly less than Co. This need 
not concern us here however, as we are primarily interested in the 
relative difference of Cb as we increase the reconstruction com- 
plexity. 

Since the evidence is maximised for model 3, this should be 
our preferred parameterisation. Of course the Bayes' factor S32 be- 
tween models 3 and 2 is only ~ 0.4, or on the Jeffreys scale of 
little significance, and since the Bayesian complexity for model 2 
is significantly smaller (by ~ 0.7) than 3 should we then argue that 
model 2 should in fact be preferred? Looking at the marginalised 
posteriors of 3, the fact that it is preferred is not at all surprising, 
as the addition of the node at k ~ 0.01656 Mpc" 1 leaves no am- 
plitude constraint at fc max . In effect the evidence is maximised for 



model 3 as it is a de-facto two parameter model. However crucially 
it provides the required tilt over a k range that is well constrained 
by data and allows a deviation in this tilt above k ~ 0.01. Further 
modelling of the upper tilt, via say an extra node as we performed in 
model 4n was strongly disfavoured £>34 n ~ 2.5. So the inclusion 
of complexity in the analysis has not altered our general conclu- 
sions, as the evidence difference between models 2 and 3 is mini- 
mal, it simply serves to highlight the lack of significance placed by 
the data in anything other than a tilted spectrum at present. 

The complexity can further explain the degneracy in evidence 
values for those models where we introduced additional large scale 
structure (e.g. 4i, 5i and 61). Fig.|7]plots Cb against Co for these 
models (and for comparison the first three models). As we increase 
the number of parameters in going from model 1 to 3 the Bayesian 
complexity is seen to rise roughly linearly, from which we infer that 
the data can usefully constrain all of the model parameters and thus 
can warrant the additional parameterisation. This trend continues 
to model 4i, but thereafter Cb tends rapidly to a constant value of 
~ 8, suggesting that the inclusion of extra parmeters in models 5i 
and 61 is superfluous. Thus despite the indifference shown by the 
evidence the Bayesian complexity has successfully, and correctly, 
relegated these models. 



8 CONCLUSIONS 

In this paper we have attempted to fit an optimal degree of struc- 
ture to the primordial power spectrum using Bayesian model selec- 
tion tools as our discriminant criteria. We find that a scale invari- 
ant spectrum is significantly ruled out, the data instead favouring a 
tilted spectrum, with perhaps some slight scale dependence of n a 
located close to k ~ 0.01 Mpc -1 . We fail to find any support in the 
data for further features beyond this simple scenario, the optimal 
reconstruction fitting between just two and three parameters. Pre- 
vious authors (including ourselves) have regularly used many more 
degrees of freedom, finding a number of 'interesting' features in the 
process. In this analysis, by accounting for Occams' razor we have 
found no statistically significant structure, much beyond a simple 
tilt, and there is, we feel, limited point in attempting more complex 
models at present, as the data simply cannot support them. 
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